Collaboration data proxy system in cloud computing platforms

ABSTRACT

In various embodiments, methods and systems for enhanced access to storage data based on a collaboration data proxy system are provided. A plurality of metadata tables on one or more peer nodes are referenced for data corresponding to a data request of a requesting node. The metadata tables indicate availability of chunks of data in the one or more peer nodes. A determination is made that the data corresponding to the data request is downloadable from the one or more node; the determination is based on the metadata tables. A download operation configuration instance is generated for a data request of a requesting node. The download operation configuration instance comprises configuration settings for downloading data corresponding to the data request from the one or more peer nodes. The chunk of data is downloaded from the corresponding one or more peer nodes where the chunk is located, using the configuration settings.

BACKGROUND

Cloud computing platforms may offer building, deployment and managementfunctionality for different types of applications and services. In thisregard, cloud computing platforms may store large amounts of data forprocessing to implement the applications and services. In operation,several different clients devices utilizing the application and servicescan request the same file or portions of the same file simultaneously.Cloud computing platforms can implement limitations to access the filefor load balancing and other resource allocation purposes. As such,client devices may have to wait or receive exceptions errors for theirrequests because the cloud storage platform imposes limitations onaccess to the file.

SUMMARY

This summary is provided to introduce a selection of concepts in asimplified form that are further described below in the detaileddescription. This summary is not intended to identify key features oressential features of the claimed subject matter, nor is it intended tobe used in isolation as an aid in determining the scope of the claimedsubject matter.

Embodiments described herein provide methods and systems for enhancedaccess to storage data based on a collaboration data proxy system areprovided. A plurality of metadata tables on one or more peer nodes arereferenced for data corresponding to a data request of a requestingnode. The metadata tables indicate availability of chunks of data in theone or more peer nodes. A determination is made that the datacorresponding to the data request is downloadable from the one or morenode; the determination is based on the metadata tables. A downloadoperation configuration instance is generated for a data request of arequesting node. The download operation configuration instance comprisesconfiguration settings for downloading data corresponding to the datarequest from the one or more peer nodes. The chunk of data is downloadedfrom the corresponding one or more peer nodes, where the chunk islocated, using the configuration settings. A download operation for thedata can further be executed based on a long-tail mitigation routinecomprising at least one of a contention avoidance workflow and anincreased download throughput workflow.

BRIEF DESCRIPTION OF THE DRAWINGS

The present invention is described in detail below with reference to theattached drawing figures, wherein:

FIG. 1 is a block diagram of an exemplary operating environment in whichembodiments described herein may be employed;

FIG. 2 is a schematic of an exemplary collaboration data proxy system,in accordance with embodiments described herein;

FIG. 3 is a schematic of an exemplary node in a collaboration data proxysystem, in accordance with embodiments described herein;

FIG. 4 is a schematic of an exemplary long-tail mitigation component ina collaboration data proxy system, in accordance with embodimentsdescribed herein;

FIGS. 5A and 5B are tables of exemplary metadata entries in acollaboration data proxy system, in accordance with embodimentsdescribed herein;

FIG. 6 is a flow diagram showing an exemplary method for enhanced accessto storage data in distributed storage systems, in accordance withembodiments described herein;

FIG. 7 is a flow diagram showing an exemplary method for enhanced accessto storage data in distributed storage systems, in accordance withembodiments described herein; and

FIG. 8 is a block diagram of an exemplary computing environment suitablefor use in implementing embodiments described herein.

DETAILED DESCRIPTION

The subject matter of embodiments of the invention is described withspecificity herein to meet statutory requirements. However, thedescription itself is not intended to limit the scope of this patent.Rather, the inventors have contemplated that the claimed subject mattermight also be embodied in other ways, to include different steps orcombinations of steps similar to the ones described in this document, inconjunction with other present or future technologies. Moreover,although the terms “step” and/or “block” may be used herein to connotedifferent elements of methods employed, the terms should not beinterpreted as implying any particular order among or between varioussteps herein disclosed unless and except when the order of individualsteps is explicitly described.

For purposes of this disclosure, the word “including” has the same broadmeaning as the word “comprising.” In addition, words such as “a” and“an,” unless otherwise indicated to the contrary, include the plural aswell as the singular. Thus, for example, the constraint of “a feature”is satisfied where one or more features are present. Also, the term “or”includes the conjunctive, the disjunctive, and both (a or b thusincludes either a or b, as well as a and b).

For purposes of a detailed discussion below, embodiments are describedwith reference to a node and client device operating environmentsupported by a cloud computing platform, the node and client deviceoperating environment includes several peer nodes processing datarequests from client devices utilizing applications and services on thecloud computing platform. However, the methods described herein can beperformed in different types of operating environments having alternateconfigurations of the functional components described herein. As such,the embodiments described herein are merely exemplary, and it iscontemplated that the techniques may be extended to other implementationcontexts.

A distributed storage system can be implemented on a cloud computingplatform that runs cloud applications and services across different datacenter and geographic regions. The cloud computing platform canimplement a fabric controller component for provisioning and managingresource allocation, deployment/upgrade, and management of cloudapplications and services. Typically, a cloud computing system acts tostore data or run applications and services in a distributed manner. Theapplication and service components of the cloud computing platform mayinclude nodes (e.g., computing devices, processing units, or blades in aserver rack) that are allocated to run one or more portions ofapplications and services.

When multiple applications and services are being supported by thenodes, the nodes may be partitioned into virtual machines or physicalmachines that concurrently run the separate service applications,respectively, in individualized computing environments that support theresources and/or operating system specific to each service application.Further, each application or service may be divided into functionalportions such that each functional portion is able to run on a separatevirtual machine. In cloud computing platforms, multiple servers may beused to run the applications and services to perform data storageoperations in a cluster. In particular, the servers may perform dataoperations independently but exposed as a single device referred to as acluster. Each server in the cluster may be referred to as a node.

A storage service on the cloud computing platform can be a servicesupported using the fabric controller component. The storage service canbe responsible for managing the replication and data placement acrossdisks and load balancing the data and the application traffic withstorage clusters. The storage service can be responsible for managingaccess to a high volume of storage. The storage service can implement astorage stamp as a cluster of N racks of storage nodes and a locationservice that manages the storage stamps. Specifically the locationservice can allocate location stamps and manage them across the storagestamps for disaster recovery and load balancing.

A cloud computing platform supporting a storage service can supportseveral different clients simultaneously requesting access to the samedata (e.g., file or Binary Large Object (BLOB)) or portions of the samedata (e.g., a chunk of data). Storage services can limit access toparticular data for load balancing and other resource allocationpurposes. For example, throughput to a blob in the cloud computingplatform storage can be limited to predefined megabits per second (e.g.,60 MB/s). Bandwidth limitations can be specifically associated with astorage account that stores the data to be accessed using a clientdevice. As such, when several different clients attempt tosimultaneously access the same data, the clients may have to wait orreceive exceptions errors for their requests because the cloud storageplatform imposes a bandwidth limit on the data.

Conventional cloud computing platforms may generate multiple copies ofthe data and store the data in several different locations in the cloudcomputing platform (e.g., nodes). In this regard, client devices canrequest the data from any one of the different locations storing thedata. Such implementations may mitigate the issue of limited bandwidthto serve several client devices; however, such an implementation lacksthe capacity to scale because a determination has to be made on thenumber of copies of the data that have to be generated and duration forstoring the data at the alternate locations. Moreover, conventionalmethods also do not include flexibility in downloading the data in thepeer nodes because download operations are performed without regard tospecific customer requirements or the download operations lack thecapacity to seamlessly adapt to specific customer requirements withoutcode changes on the cloud computing platform.

Embodiments of the present invention provide simple an efficient methodsand systems for providing enhanced access to data based on acollaboration data proxy system using metadata tables. The collaborationdata proxy system provides access to data based on a global cache proxyframework that can be scaled using in-memory metadata tables. Inparticular, a collaboration data proxy service (“proxy service”) canimplement metadata tables on nodes to manage data (e.g., files or blobs)as a plurality of data duplicates and store the data duplicates indifferent duplicate storage locations (e.g., peer nodes). In thisregard, a client device via a client node (e.g., requesting node) canaccess the data using any of the plurality of data duplicates at theduplicate storage locations. Implementing a plurality of data duplicatesin duplicate storage locations can mitigate access limitations (e.g.,wait times and exceptions) associated with simultaneous access tostorage data.

Embodiments described herein utilize a global cache proxy frameworksupport enhanced access to data. A client node can access data from acloud computing platform (e.g., via a storage service) in a moreefficient way using the global cache proxy framework. The global cacheframework can also be implemented as a flexible topology configuration.The flexible topology scheme adapts to specific business scenarioswithout need for coding changes in the global cache proxy framework. Theglobal cache proxy framework can include a configurable cache strategythat operates as a Mostly Global Available (MGA) cache that increasescache efficiency. Embodiments described herein further include includedynamic download strategies to address long tail issues associated withtraditional peer download mechanisms. It is contemplated that the globalcache proxy framework can operate as a plug and play implementation thatcan be functional with different types of cloud computing platformstorage services.

A global cache proxy framework can be described by way of example, at ahigh level. In an exemplary embodiment, a customer account can beassociated with a plurality of nodes (e.g., N1, N2, N3 . . . Nk) thatare requesting access to the same data (e.g., blob or a file) in storagein the cloud computing platform. A node can be a worker role instance, avirtual machine, or a standalone computing device instance in a cloudcomputing platform implementation as described in more detail herein.Embodiments herein can be implemented based on an inter-communicationprerequisite (e.g., an established communication channel) between theplurality of nodes. In other words, the nodes should be able to directlycommunicate with each other and the plurality of nodes should havenetwork connectivity to the storage service on the cloud computingplatform.

The global cache proxy framework may implement a global read-only cacheproxy using the cloud computing platform. As such, when the plurality ofnodes request data from a specific data location, the requests can beserved using duplicate storage locations (e.g., peer nodes) that alreadyhave the data or chunk of the data, where the chunk represents a portionof the data. In this regard, the peer nodes each operate as data cacheproxies, such that, if a peer node has a duplicate of the requested dataalready downloaded, the node would download the data from the peer node.It is contemplated that a request for data would go to the cloudcomputing platform storage when the peer nodes do not have the requestedsource data already downloaded. Data in the peers nodes and cloudcomputing platform that are part of the proxy service can bespecifically identified for caching in the collaboration data proxysystem. It is contemplated that some data transferred in the cloudcomputing platform system can be excluded from the collaboration dataproxy system based on a designation associated with the data.

Retrieving the data from a peer node having the data stored in a datacache proxy improves on accessibility of cloud computing data. Forexample, a client device would experience reduced latency and improvedthroughput as in most cases the amount of bandwidth available among thenodes can be higher than bandwidth available through the cloud computingplatform. Further, the cloud computing platform storage service canexperience a reduced volume of concurrent transaction requests thatimproves on the operation of cloud computing platform.

Accordingly, in a first embodiment described herein, a system forproviding enhanced access to data in distributed storage systems isprovided. The system includes a collaboration data proxy componentconfigured for determining whether data corresponding to a data requestof a requesting node is downloadable from one or more peer nodes, anddownloading a chunk of the data from the one or more peer nodes, whenthe chunk is located on the one or more peer nodes. Downloading the datais based on a long-tail mitigation routine. The system further includesa long-tail mitigation component configured for: executing the long tailmitigation routine that facilitates downloading the chunk of data basedon at least one of: a contention avoidance workflow and an increaseddownload throughput workflow, where the contention avoidance workflow orthe increased download throughput workflow adjust download attributes ofa download operation for the data. Download attributes refer dynamicallyconfigurable characteristics specific to the download operation that areadjustable to implement a particular download strategy.

In a second embodiment described herein, one or more computer storagemedia having computer-executable instructions embodied thereon that,when executed, by one or more processors, causes the one or moreprocessors to perform a method for enhanced access to data indistributed storage systems are provided. The method includesreferencing metadata tables corresponding to the one or more peer nodesfor data corresponding to a data request. The metadata tables indicateavailability of chunks of data in the one or more peer nodes. The methodalso includes determining that the data corresponding to data request isdownloadable from the one or more peer nodes. The determination is basedon the metadata tables. The method further includes generating adownload operation configuration instance for the data request, thedownload operation instance comprises configuration settings fordownloading data corresponding to the data request. The method includesdownloading a chunk of the data from the one or more peer nodes usingthe configuration settings of the download operation configurationinstance.

In a third embodiment described herein, a computer-implemented methodfor enhanced access to data in distributed storage systems is provided.The method includes determining that a data request, associated with arequesting node, is directed to collaboration data. Collaboration datacomprises data that is stored on at least one of: a cloud computingplatform storage or one or more peer nodes in a collaboration proxysystem that has been identified for caching in the collaboration dataproxy system. The method also includes referencing metadata tablescorresponding to the one or more peer nodes for data corresponding tothe data request. Referencing the metadata tables corresponding to theone or more peer nodes is based on an established communication channelwith the one or more peer nodes. The method further includes determiningthat at least a chunk of the data corresponding to data request isdownloadable from the one or more peer nodes, where the determination isbased on the metadata tables. The method also includes generating adownload operation configuration instance for the data request of therequesting node, where the download operation instance comprisesconfiguration settings for downloading data corresponding to the datarequest from the one or more nodes. The method includes downloading achunk of the data from the one or more peer nodes using theconfiguration settings on the download operation configuration instance,where downloading the chunk of data is based on a long-tail mitigationroutine.

Referring now to FIG. 1, FIG. 1 illustrates an exemplary collaborationdata proxy system (“data proxy system”) 100 in which implementations ofthe present disclosure may be employed. In particular, FIG. 1 shows ahigh level architecture of collaboration data proxy system (“data proxysystem”) 100 in accordance with implementations of the presentdisclosure. It should be understood that this and other arrangementsdescribed herein are set forth only as examples. Other arrangements andelements (e.g., machines, interfaces, functions, orders, and groupingsof functions, etc.) can be used in addition to or instead of thoseshown, and some elements may be omitted altogether. Further, many of theelements described herein are functional entities that may beimplemented as discrete or distributed components or in conjunction withother components, and in any suitable combination and location. Variousfunctions described herein as being performed by one or more entitiesmay be carried out by hardware, firmware, and/or software. For instance,various functions may be carried out by a processor executinginstructions stored in memory.

Among other components not shown, data proxy system 100 includes a cloudcomputing platform having storage 112, nodes 120, 130, and 140 eachservice correspond clients computing devices 122, 132, and 142respectively. Each client computing device resides on any type ofcomputing device, which may correspond to computing device 800 describedwith reference to FIG. 8, for example. The components of data proxysystem 100 may communicate with each other over a network, which mayinclude, without limitation, one or more local area networks (LANs)and/or wide area networks (WANs). Any number of nodes (e.g., servers)and client computing devices may be employed within the data proxysystem 100 within the scope of implementations of the presentdisclosure.

In data proxy system 100 supported by the cloud computing platform 110,the nodes, such as nodes 120, 130, and 140 are utilized to store andprovide access to data in the storage of cloud computing platform. Thecloud computing platform 210 also may be a public cloud, a privatecloud, or a dedicated cloud. The cloud computing platform 210 mayinclude a data center configured to host and support operation ofendpoints in a particular service application. The phrase “application”or “service” as used herein broadly refers to any software, or portionsof software, that run on top of, or accesses storage locations within,the datacenter. In one embodiment, one or more of the endpoints mayrepresent the portions of software, component programs, or instances ofroles that participate in the service application. Also clients 122,132, and 142 may be configured to provide applications, to access to thedata. Client 122 can be linked into an application or a servicesupported by the cloud computing platform 110.

Having described various aspects of data proxy system 100, it is notedthat any number of components may be employed to achieve the desiredfunctionality within the scope of the present disclosure. Although thevarious components of FIG. 1 are shown with lines for the sake ofclarity, in reality, delineating various components is not so clear, andmetaphorically, the lines may more accurately be grey or fuzzy. Further,although some components of FIG. 1 are depicted as single components,the depictions are exemplary in nature and in number and are not to beconstrued as limiting for all implementations of the present disclosure.

With reference to FIG. 2, FIG. 2 includes the global proxy framework 200of the proxy system. The global proxy framework 200 and functionalitysupported therein can be described by way of an exemplary operatingenvironment. The global proxy framework 200 can include a cloudcomputing platform storage 210, a peer node group having a requestingnode 230, and peer nodes 230 and 240. Each node can include severalcomponents that facilitate performing the functionality describedherein. As shown in FIG. 2, each node includes a data proxy componenthaving a data proxy component 222, 232, 242 respectively.

With reference to FIG. 3, FIG. 3 illustrates an exemplary node 300 caninclude a data proxy component 310 having a data proxy operationscomponent 320, a long-tail mitigation component 340, a flexible topologydefinition component 360, and a metadata table component 380. The dataproxy operations component can further include data downloader 322, adownload controller 324, a peer sender 326, a peer communication manager328, a connection acceptor 330, and a cache manger 332. Otherarrangements of components (e.g., machines, interfaces, functions,orders, and groupings of functions, etc.) can be used in addition to orinstead of those shown, and some elements may be omitted altogether. Theglobal proxy framework 200 and node 300 are meant to be exemplary andnot limiting to the functionality described herein.

In operation, the data proxy component 310 is configured to support theplurality of components identified therein. The data proxy component 310provides enhanced access to data in a distributed storage system such asa cloud computing platform having a plurality of nodes requesting largeamounts of data (e.g., blobs and files). As such, the data can beavailable in distinct chunks that facilitate communicating andtransferring the data from one storage location to another.

The data proxy component 310 is responsible for receiving a data requestassociated with a requesting node and determining whether the datarequest from the requesting node is directed to collaboration data. Thecollaboration data refers to data that is stored in the cloud computingplatform storage or one or more peer nodes. The requesting node canrequest data that is not part of the collaboration data proxy system,for example, data that is external to the cloud computing platformdistributed storage (e.g., the cloud computing platform storage or peernodes), as such, the data proxy component 310 can forward the datarequest to a data source corresponding to the data request, when therequest is not directed to collaboration data. The data proxy component310 can process the data request based on availability of the data onthe cloud computing platform distributed storage when the request isdirected to collaboration data.

The data proxy component 310 is further responsible for determiningwhether data corresponding to a data request of a requesting node isdownloadable from one or more peer nodes. Determining that the data isavailable on peer nodes is based on referencing metadata tables.Metadata tables indicate availability of chunks of data on acorresponding node. Nodes are configured to maintain metadata tables asdiscussed on more detailed herein. The data proxy component 310 on anode can reference the metadata tables corresponding to peer nodes usingan established communication channel with the peer nodes. It iscontemplated that a node maintains a metadata table with the most-up-todate information from peer nodes. The information can be communicatedbetween the nodes. The data proxy component 310 downloads a chunk of thedata from peer nodes, when the chunk is located on the one or more peernodes. Downloading the data from the peer node can use a long-tailmitigation routine that is supported by the long-tail mitigationcomponent 340.

The long-tail mitigation component 340 is responsible for supporting along tail mitigation routine. The long tail mitigation routine canimplement dynamic download strategies in the global framework data proxyto resolve long tail issues experienced in traditional peer to peerdownload mechanisms and improve performance of the data cache proxysystem. At a high level, a long tail issue refers to an operation thatinvolves a large number of nodes requesting to download data (e.g., afile or blob) to a local storage associated with the node. In operation,a subset of nodes from the nodes may take a significantly longer periodof time to finish the download operation than another subset of nodesthat finish the download operation much faster. The larger the size ofthe data being downloaded the more the discrepancy between the timeperiods between slower nodes and faster nodes.

By way of example, sufficiently large data sizes for slower nodes cantake up to three times longer to download data than the faster nodes.The difference between slower nodes and faster nodes can be attributedto difference or heterogeneity of hardware, speed of local media onwhich the data is written to, and the distributed nature of the cloudcomputing platforms. As such, each node independently determines, basedon the factors described, how to download data (e.g., data chunks of ablob) and where the data can be retrieved from. The outcome inevitablyinvolves a number of nodes attempting to retrieve the same data from asingle source node which eventually causes a bottleneck.

With reference to FIG. 4, embodiments described herein utilize thelong-tail mitigation component 400 to implement a long tail mitigationroutine to reduce or eliminate the effect of the issues described above.Each node can be configured to support the long-tail mitigation routinewith each individual long-tail mitigation component 400. The long tailmitigation routine can be implemented using two main workflows: thecontention avoidance workflow and the increased download throughput onslow nodes workflow corresponding to a contention avoidance component410 and an increased download throughput component 420 having phaseidentification 422, head phase 424, body phase 426 and tail phase 428modules that support corresponding functionality described herein. Thecontention avoidance workflow and the increased download throughputworkflow can be implemented individually and in combination to adjustdownload attributes of a download operation corresponding the data.

The contention avoidance workflow refers to avoiding contention based ona chunk selection algorithm. Chunk selection based on the chunkselection algorithm can be altered to have a higher probability ofchoosing chunks that are not yet being downloaded by other nodes orbeing downloaded by the fewest number of nodes. By way of example, thealgorithm may include reviewing a plurality of chunks being downloadedand the number of peer nodes downloading a particular chunk andselecting a chunk for download based on the lowest number of peer nodesdownloading the particular chunk. The chunk selection algorithm canfurther include implement a download strategy where more network callsare associated with chunks of data having a lower number of concurrentnodes downloading the chunks of data and less network calls areassociated with chunks of data having a higher number of concurrentnodes downloading the chunks of data. Other variations and combinationof the chunk selection algorithm are contemplated with embodimentsdescribed herein.

The increased download throughput workflow refers to employing differentdownloading strategies during different phases of the downloading.Download phases can include head, body, and tail. Each phase cancorrespond to a defined percentage of data chunks downloaded. A phaseidentification scheme facilitates identifying what phase of downloadinga particular download operation is currently in. In particular, thephase can be dependent on a number of chunks that have been downloadedto the local requesting node compared to the total chunk size of therequested data. The phase identification scheme can be implemented as aconfiguration threshold. In embodiments, the configurable threshold canhave default values, for example, a head stage can correspond to a headthreshold where the downloading process is beginning with about 0%-30%of the data download, the body stage can correspond to a body thresholdduring a downloading process where about 30%-85% of the data has beendownloaded, and the tail stage can correspond to a tail threshold wherethe downloading process is near the end with about 85% of the datadownloaded.

With reference to the head phase, at the start of the download, allnodes can be empty with source data. As such, the nodes can then attemptto retrieve random chunks from the cloud computing platform storage.During the head phase, the nodes can download different chunks of datafrom the cloud computing platform storage to seed to the data cacheproxy system. If it is determined that the nodes receiving the data arebeing throttled when downloading data from the cloud computing platformstorage, a determination can then be made to aggressively back-off orthrottle downloading to a defined low rate from the cloud computingplatform storage and trigger downloading from peer nodes instead.Throttling downloading can be based on reducing the number of networkcalls downloading the data from the cloud computing platform storage.Triggering downloading can include generating new network calls to peernodes for downloading the data.

The head phase can also include the long-tail mitigation component 400determining that no throttling is occurring at the cloud computingplatform storage, for example after a predefined period of time. Assuch, each node may increase the number of calls to the cloud computingplatform storage to download the data. An increase to in the number ofnetwork calls can be gradual—over a period of time or immediate—at thesame time. An increase can continue until either the node starts beingthrottled or the node download throughput is no longer increasing whenincreasing the number of calls to the download the data.

With reference to a body phase, the body phase is based on identifying apredetermined percentage of chunks being seeded on the network. Uponsaid determination, the implementation can reduce the number of cloudcomputing platform storage download calls for the data in order toretrieve chunks from peer nodes instead. The body phase as describedherein advantageously increases the availability of chunks on thenetwork. The body phase can continue until a predefined percentage ofpeer nodes have finished download the cloud computing platform storage.It is contemplated that the body phase can be followed by a tail phase.

With reference to the tail phase, during the tail phase a node no longerattempts to download each chunk from a single node but rather downloadsthe same chunk from a plurality of nodes termed tail phase nodes. It iscontemplated that several tail phase node groups can be identified whereeach tail phase peer node group is used to download the same chunk. Inoperation, the tail phase comprises selecting a number of tail phasepeer nodes from which to download the same chunk. Tail phase peer nodesrefer to a number of peer nodes from which a requesting node willdownload the same chunk of data. As such, a single chunk can berequested from multiple tail phase peer nodes and one tail phase peernode will deliver the requested chunk the fastest. The chunk from thefastest tail phase peer node is processed and the chunks from the othertail phase peer nodes can be discarded. It is contemplated that arequest for a chunk from the slower nodes can be affirmatively cancelledupon receipt of the chunk from the fastest node.

With reference to FIG. 3, the flexible topology definition component 360can support flexible configuration of the functionality supportedherein. The flexible configuration scheme adapts to different clientdevice request requirements (e.g., customer scenarios) withoutnecessitating alteration of the code of the global cache framework. Forexample, configuration settings can be associated with a particularcustomer, account, or specified attribute (e.g., time of day orlocation) such that when the customer, account, or specified attributeis identified by the flexible topology definition components, particularconfiguration settings are configured for downloading request data.

The flexible topology configuration component 360 utilizes a downloadoperation topology schema (“schema”) that provides supports the datacache proxy system. The schema can be used to generate a configurationinstance that corresponds to download operations that is executed usingthe configuration settings in the configuration instance. The schema maybe applied in several different types of implementations that include abroad schema-based representation of specific aspects of a downloadoperation. The schema (e.g., an XML schema) may define constraints(e.g., physical and logical elements associated with attributes) in thestructure and content of the schema that corresponds to a downloadoperation. The constraints can be expressed using grammatical rulesgoverning the order of elements, data types of elements and programmaticdefinition of attributes, and additional specialized rules as describedherein.

The schema can be defined using a download operation topology definitionschema template language (“template language”). The template languagemay be a markup language (e.g., Extensible Markup language—XML) thatdefines a set of rules for encoding the schema in a human-readable andmachine-readable format. The template language may specifically supportdifferent template elements that comprise variable placeholders for aplurality of template elements. One or more of the template elements canbe used to generate a configuration instance for a download operation.The configuration instance for a download operation in one embodimentmay be a simple configuration text file for download operation variableto be defined using template element. It is contemplated that the schemamay be represented in a single file as a configuration instance.

Template elements may be predefined in the template and additionaltemplate elements may be dynamically populated from the one or morecomponents in the data proxy system. Template elements in the downloadoperation configuration instance may be associated with differentfeatures in the template language that facilitate evaluating thetemplate elements to support performing the download operation. A sampledownload operation configuration instance is reproduced below withdifferent template elements (e.g., collaboration cache proxyconfiguration settings, collaboration download blobs, blob, storagecredentials, and download settings) and corresponding components andtheir attributes. In this regard, the constraints in the definition of adownload operation can facilitate downloading data from peer nodes orthe cloud computing platform storage based on the specific constraintsor configuration settings defined a configuration instance of a downloadoperation.

It is contemplated that the download operation configuration instance isassociated with an account (e.g., customer) such that configurationsettings are identified based cloud computing platform requirementsdefined for that account. For example, service-level agreementsassociated with the customer account can facilitate defining theconfiguration settings of configuration instance. For example, dataaccess service level agreement (e.g., data request response times) canimplicate the specific configuration settings for an instance.

<?xml version=“[Value]”encoding=“[Value]”?><collaborativeCacheProxyConfigSettings version=“[Value]”> <collaborativeDownloadBlobs>  <blob uri=“https://[Path]”>  <storageCredentials mode=“[Value]”>   <sharedKeyCredentialskeyValue=“[Value]”/>   </storageCredentials>  </blob>  <bloburi=“https://[Path]”>`   <storageCredentials mode=“[Value]”>  <sasTokenCredentials token=“[Value]” />   </storageCredentials> </blob>  <blob uri=“https://[Path]”>   <storageCredentialsmode=“Anonymous” />  </blob>  </collaborativeDownloadBlobs> <downloadSettings    CacheFilePath=“[Path]”   UseSingleThreadForFileReadAndWrite = “[True or False]”   MaxMemoryConsumptionByBuffersInChunkSizeUnit = “[Value]”   SmallBufferSize = “[Value]”    MaxNumberOfSmallBuffers = “[Number]”   InMemoryCacheMode =“ [Value] ”    ExchangeNodeListAndChunkMapInterval= “[Value]”    PeerDownloaderSleepPeriod = “[Value]”   BlobDownloaderSleepPeriod = “[Value]”   DirectlyDownloadFromBlobThreshHold = “[Value]”   NumberOfBlobDownloaders = “[Value]”    MinSelectedPeers = “[Value]”   MaxSelectedPeers = “[Value]”    SelectPeersRatio = “[Value]”   PeerSender_ReadTimeout = “[Value]”    PeerSender_ReconnectLimit =“[Value]”    TcpConnectionDefaultWriteTimeout = “[Value]”   PeerReceiver_ReadTimeout = “[Value]”    ControlService_ReadTimeout =“[Value]”    PeerCommunicationManager_NumberOfRoundsToEmptyBlockList =“[Value]”    MaxConcurrentConnectionsForControlService = “[Value]”   MaxConcurrentConnectionsForP2PServiceMargin = “[Value]”   ConnectionAcceptor_DelayBeforeAcceptNewClientWhenThrottling “[Value]”   ValidateMD5AfterDownloadingFromBlob = “[True or False]”   ValidateMD5AfterDownloadingFromPeers = “[True or False]”   ValidateFullFileMD5AfterDownloadComplete = “[True or False]”  /></collaborativeCachProxyConfigSettings>

Configuration Name Description CacheFilePath Location to store localcache data. UseSingleThreadForFileReadAndWrite This setting controlswhether to use a single actor for both read and write file. Depending onthe type of device used for storing the destination file performancecould be improved by using a single thread to do both file read andwrites to limit contention. MaxMemoryConsumptionByBuffersInChunkSizeUnitThis setting controls the max memory consumption by large buffers (whichare used to read chunks from the network stream or local file) in theunit of a chunk size. SmallBufferSize This setting controls the size ofa small buffer which is used to store metadata communication betweenpeers. InMemoryCacheMode This setting configures which chunks are cachedin memory, default mode is LRU (Least Recent Used). The LRU evictionpolicy only depends on information of the cache on the local machine, asdo a number of other options: LRU (Least Recently Used) For each inmemory chunk the data proxy component can keep track of when it was lastwritten or served to a peer node. When the data proxy component need toevicts chunks from the cache it will evict one of the least recentlyused cache chunks. LFU (Least Frequently Used) For each in memory chunkdata proxy component keeps track of how many times it has been accessed(for serving to peers). When the data proxy component needs to evictchunks from the cache the data proxy component will evict those that areserved least frequently. FIFO (First In First Out) When the data proxycomponent needs to evict chunks from the cache the data proxy componentwill evict them strictly based on the order they have been added to thecache. RR(Random Replacement) When the data proxy component needs toevict chunks from the cache the data proxy component will select arandom chunk to be evicted. In addition to the earlier mentioned optionsthe data proxy component can also provide an option that looks at thedata available on peers as well: MGA (Most Globally Available) When achunk needs to be evicted from the cache the data proxy component willevict them based on how many copies of each chunk are available acrossall peer nodes. ExchangeNodeListAndChunkMapInterval This settingcontrols how frequently the node list and chunk map information shouldbe exchanged with our peers. DirectlyDownloadFromBlobThreshHold Thissetting controls the blob downloader to only download chunk from blobstorage directly if its “AvailableOnNumberOfNodes” property is belowthis threshold. NumberOfBlobDownloaders This setting controls how manyblob downloaders would be created by the download controller for asingle download session which would be used to download chunk directlyfrom the blob. MinSelectedPeers, MaxSelectedPeers, These three settingscontrol how many peers the SelectPeersRatio peer communication managerwill try to connect to. It will try to establish a peer relationshipwith. K = total nodes. P = Min(MaxSelectedPeers, Max(MinSelectedPeers,(K * SelectPeersRatio))) PeerSender_ReconnectLimit This setting controlshow many times PeerSender will try to reconnect to the remote endpointcontinuously (e.g. If a PeerSender failed to connect to the remoteendpoint continuously, it will stop itself).PeerCommunicationManager_NumberOfRoundsToEmptyBlockList This settingcontrols the PeerCommunicationManager would empty its block list afterhow many rounds of “DoWork” This means the temp block list would beempty in each (EnsureConnectionToPeersInterval * this value) time. Bydefault it is 10000 milliseconds * 15 = 150 s.MaxConcurrentConnectionsForP2PServiceMargin This setting controls themargin (difference) between the max concurrent connections for P2Pservice and the number of peers to select.ConnectionAcceptor_DelayBeforeAcceptNewClientWhenThrottling This settingcontrols how much time the ConnectionAcceptor instance would delaybefore accepting new instances when max concurrent connections limit isreached.

With continued reference to FIG. 3, the flexible topology configurationcomponent 360 can be implemented using the components in the data proxyoperations components 320. Each of the components can perform specificoperations to facilitate and adaptable operating environment thatsupport specific client requirements. For example, the data downloadcomponent 322 can be configured to download data or chunks of data. Datacan include a file or a blob or portions thereof in defined chunks. Inparticular, the data download component 322 can be implemented as aworker thread to download blob chunks from a cloud computing platformstorage or a peer node. It is contemplated that a plurality of peernodes can define a particular download group of peer nodes that operatestogether.

The download controller 324 can be configured to manage a life cycle ofa download. The life cycle of a download for a particular data requestcan include several defined steps. For example, the peer sendercomponent 328 can provide an actor that makes calls to establish acommunication channel and communicate with peer nodes and the cloudcomputing platform. The peer communication manager 328 can maintain(e.g., clean) the metadata tables, as discussed herein in more detailand facilitate selecting communication peer nodes. Further, theconnection acceptor 330 component can accept incoming connections frompeer nodes. The download controller 332 can be used by multipledownloaders when downloading multiple targets data items (e.g., files orblobs) concurrently. The cache manager 332 is responsible for cachingdata based on the download operation configuration instance settings.

With reference to FIG. 3, the metadata component 380 is responsible formanaging and implementing a metadata workflow to facilitatefunctionality of the embodiments described herein. In particular, themetadata component 380 stores and communicates metadata tables incorresponding peer nodes to support scalability in the data cache proxysystem. The implementation of metadata tables supports scalability basedon efficient communication of peer nodes with each other and the cloudcomputing platform based on the metadata table on each node.

Turning to FIG. 5A, by way of example, a metadata table 530 can beimplemented as a 2-dimensional table on a node 520. The metadata tablemaintains the availability of each data chunk, of the data download 512(e.g., blob), on every node inside a peer group. The availability datafor the data chunks can be maintained in real time. Each cell representsa metadata entry. For cell 532 in the 1st row and 4th column, adesignation of 1 can indicate that the corresponding data chunk—chunk 4534 is cached by node 1 536. The 1 indicates that the data chunk isavailable to be downloaded from the node. It is contemplated that thedesignation that a chunk is available is disregards whether the node isbusy. 0 538 designates that the data chunk—chunk 3 540 is not availableat the node. The metadata component adds a new row to the table when anew peer node is added to the peer group, and also adds a new metadatatable for a new download. For example, a metadata table can beassociated with an individual blob.

The metadata component 380 is also responsible for managing entries inthe metadata table based on chunks (e.g., chunk 1 540) transferred to anode. The metadata entry can be associated several defined fields thatdefine attributes of the metadata entry. By way of example, fields caninclude a property field (e.g., Download ID, Node ID, Chunk ID,Timestamp, and IsInsertion) corresponding to a type field (Download ID,std::string, uint32_t, unint64_t, and bool), and a description field.The Download ID indicates an identifier for the download, the Node IDindicates an identifier for the node on which the metadata entryoriginates, the Chunk ID indicates a particular chunk of a plurality ofchunks associated with a download, a time stamp indicates a time whenthe metadata change occurred and an IsInsertion indicate whether thechange is an insertion or deletion change. It is contemplated that whena chunk is inserted into the cache or deleted from the cache, acorresponding metadata entry can be generated. The metadata entry can bepropagated to peer nodes after a synchronization operation.

A metadata entry refers to a basic data element in a metadata exchangeoperation. The metadata exchange operation can be performed between twopeer nodes communicating metadata corresponding to data stored at thepeer node. A timestamp can be attached to each metadata entry.Timestamps can be durable and can also persist across processes. Nodesmay use the timestamps to resolve conflicts and discards duplicates onreceived metadata entries.

Metadata tables can be maintained in in-memory of a peer node. Anin-memory table on each peer node can correspond to each data download.The table can include several portions (e.g., a global sorted chunklist, an insertion table, and a deletion table). Specifically, withreference to FIG. 5B, the table can include a global sorted chunk list550 that refers to a list that indicates a union of the sets of all thechunks that are downloaded by all nodes. The global sorted chunk can besorted by the chunk copy count.

The table can further include an insertion table 560 that contains allinsertion metadata entries generated by the node itself and peer nodes.The table can reflect the view of all individual cache manager (e.g.,cache manager 332) in each peer node as each insertion corresponds to anavailable chunk that is cached. The table can provide informationcorresponding to queries from a data downloader (e.g., data downloader332).

The table also includes a deletion table 570 that preserves deletionsin-memory. Maintaining a deletion table facilitates handling metadataconflicts. For example, suppose a chunk was cached by Node 1 at time t1,and then removed later at time t2 where (t1<t2). Now, due to a networkdelay, the deletion metadata entry may be propagated to Node 2 earlierthan the insertion. A deletion table can be used to resolve scenariossuch as the one described above as the deletion table can be referencedto reconcile such differences in information. In order to avoid thedeletion table from growing unbounded, the table can be stored using afirst-in-first-out cache mechanism. Upon meeting a predefined threshold,new insertions in the deletion table can cause a deletion record to beremoved from the table.

The metadata component 380 is responsible for determining whether areceived chunk of data at the node is applicable, or in other words, isa chunk that is tracked for the data proxy system. When the data chunkis applicable, the metadata component applies a corresponding metadataentry into the metadata table and updates the global sored chunk list.

An initialization component 382 is responsible for managing a life cycleof the metadata recorded for data downloads. Specifically, the metadatacomponent 380 performs an initialization operation to build a metadataentry table for a corresponding node. The initialization operationincludes retrieving tables from peer nodes when the node joins thecollaboration data proxy network. Initialization operations can beperformed during startup of a node or on failovers. An initializationoperation can include specific actions for building the metadata entrytable. Actions can include pulling metadata entry table from one or moreselected peer nodes. It is contemplated that a metadata entry table inindividual node can differ due to latency in the collaboration dataproxy network. The pulled metadata entry tables can be in the form ofbatched updates. The metadata component can merge all the updates andapply them to the in-memory table. Initializing can further include acache manager that detects pre-cached chunks and notifying the metadatacomponent 380 about their existence. The metadata component 380 needs tosend out all metadata entries when a particular node is a brand new nodefrom existing nodes point of view.

An update component 384 can be responsible for managing updatescommunicated between the nodes. Nodes send updates (e.g., update 580)between each other to communicate metadata entries. Metadata entries canoriginate from the node itself. Metadata entries in one update can havedifferent timestamps and types (insertion or deletion). A node ID may beimplemented by the update component. The node ID corresponds to themetadata entries to identify the origin node. It is contemplated that anupdate comprises both types of metadata changes an insertion change anda deletion change. In cases where an insertion and deletion correspondto the same chunk appearing in the same metadata sync interval, theinsertion and deletion can be ignored. Chunks of data can be downloadedonce by a single node where each chunk has only one insertion and atmost one deletion.

A maintenance component 386 is responsible for managing maintenance ofin-memory table. The maintenance component 386 facilitates performingone or more maintenance operations on the in-memory table. Inparticular, maintenance operations performed by the maintenancecomponent can be initiated based on identifying particular maintenancetriggers. Maintenance triggers can include local metadata changes,messages from peers (e.g., periodic sync messages, response messages tostartup pulling requests, NAK (negative acknowledgment) received whenthe download controller 324 tries to download a chunk from a peer. Themaintenance component 386 can further be responsible for applyingmetadata entries. For example, a metadata entry can be applied into thein-memory table when the same or a newer version of the memory does notexist in the in-memory table. The applicable metadata entry, exceptthose returned for pulling request can be sent to message a sendercomponent and later peers for period synchronization. Metadata entriesabout NAK can also be forwarded to communicate for the informationavailable therein. When a metadata entry is not applicable, the downloadcontroller 324 can drop the metadata entry. A node will send out ametadata entry at most once.

Turning now to FIG. 6, a flow diagram is provided that illustrates amethod 600 for enhanced access to data in distributed storage systems.Initially at block 610, metadata tables in one or more peer nodes, isreferenced by a requesting node for data in a corresponding datarequest. The metadata tables indicate availability of chunks of data inthe one or more peer nodes. At block 620, a determination that the datacorresponding to data request is downloadable from the one or more peernodes is made based on the metadata tables. At block 630, a downloadoperation configuration instance for the data request is generated. Thedownload instance includes configuration settings for downloading thedata corresponding to the data request. At block 640, a chunk of data isdownloaded from the one or more peer nodes using the configurationsettings of the download operation configuration instance.

Turning now to FIG. 7, a flow diagram is provided that illustrates amethod 700 for enhanced access to storage data in a distributed storagesystem. At block 710, a determination that a data request, associatedwith a requesting node, is directed to collaboration data is made.Collaboration data comprises data that is stored on at least one of: acloud computing platform storage or one or more peer nodes in acollaboration proxy system. At block 720, metadata tables in the one ormore peer nodes for data corresponding to the data request arereferenced. Referencing the metadata tables corresponding to the one ormore peer nodes is based on an established communication channel withthe one or more peer nodes. At block 730, a determination is made thatat least a chunk of the data corresponding to data request isdownloadable from the one or more peer nodes. The determination is basedon the metadata tables. At block 740, a download operation configurationinstance for the data request of the requesting node is generated. Thedownload operation instance comprises configuration settings fordownloading data corresponding to the data request from the one or morenodes. At block 750, the chunk of the data from the one or more peernodes is downloaded using the configuration settings on the downloadoperation configuration instance, where downloading the chunk of data isbased on a long-tail mitigation routine.

Having briefly described an overview of embodiments of the presentinvention, an exemplary operating environment in which embodiments ofthe present invention may be implemented is described below in order toprovide a general context for various aspects of the present invention.Referring initially to FIG. 8 in particular, an exemplary operatingenvironment for implementing embodiments of the present invention isshown and designated generally as computing device 800. Computing device800 is but one example of a suitable computing environment and is notintended to suggest any limitation as to the scope of use orfunctionality of the invention. Neither should the computing device 800be interpreted as having any dependency or requirement relating to anyone or combination of components illustrated.

The invention may be described in the general context of computer codeor machine-useable instructions, including computer-executableinstructions such as program modules, being executed by a computer orother machine, such as a personal data assistant or other handhelddevice. Generally, program modules including routines, programs,objects, components, data structures, etc. refer to code that performparticular tasks or implement particular abstract data types. Theinvention may be practiced in a variety of system configurations,including hand-held devices, consumer electronics, general-purposecomputers, more specialty computing devices, etc. The invention may alsobe practiced in distributed computing environments where tasks areperformed by remote-processing devices that are linked through acommunications network.

With reference to FIG. 8, computing device 800 includes a bus 810 thatdirectly or indirectly couples the following devices: memory 812, one ormore processors 814, one or more presentation components 816,input/output ports 818, input/output components 820, and an illustrativepower supply 822. Bus 810 represents what may be one or more busses(such as an address bus, data bus, or combination thereof). Although thevarious blocks of FIG. 8 are shown with lines for the sake of clarity,in reality, delineating various components is not so clear, andmetaphorically, the lines would more accurately be grey and fuzzy. Forexample, one may consider a presentation component such as a displaydevice to be an I/O component. Also, processors have memory. Werecognize that such is the nature of the art, and reiterate that thediagram of FIG. 8 is merely illustrative of an exemplary computingdevice that can be used in connection with one or more embodiments ofthe present invention. Distinction is not made between such categoriesas “workstation,” “server,” “laptop,” “hand-held device,” etc., as allare contemplated within the scope of FIG. 8 and reference to “computingdevice.”

Computing device 800 typically includes a variety of computer-readablemedia. Computer-readable media can be any available media that can beaccessed by computing device 800 and includes both volatile andnonvolatile media, removable and non-removable media. By way of example,and not limitation, computer-readable media may comprise computerstorage media and communication media.

Computer storage media include volatile and nonvolatile, removable andnon-removable media implemented in any method or technology for storageof information such as computer-readable instructions, data structures,program modules or other data. Computer storage media includes, but isnot limited to, RAM, ROM, EEPROM, flash memory or other memorytechnology, CD-ROM, digital versatile disks (DVD) or other optical diskstorage, magnetic cassettes, magnetic tape, magnetic disk storage orother magnetic storage devices, or any other medium which can be used tostore the desired information and which can be accessed by computingdevice 100. Computer storage media excludes signals per se.

Communication media typically embodies computer-readable instructions,data structures, program modules or other data in a modulated datasignal such as a carrier wave or other transport mechanism and includesany information delivery media. The term “modulated data signal” means asignal that has one or more of its characteristics set or changed insuch a manner as to encode information in the signal. By way of example,and not limitation, communication media includes wired media such as awired network or direct-wired connection, and wireless media such asacoustic, RF, infrared and other wireless media. Combinations of any ofthe above should also be included within the scope of computer-readablemedia.

Memory 812 includes computer storage media in the form of volatileand/or nonvolatile memory. The memory may be removable, non-removable,or a combination thereof. Exemplary hardware devices include solid-statememory, hard drives, optical-disc drives, etc. Computing device 800includes one or more processors that read data from various entitiessuch as memory 812 or I/O components 820. Presentation component(s) 816present data indications to a user or other device. Exemplarypresentation components include a display device, speaker, printingcomponent, vibrating component, etc.

I/O ports 818 allow computing device 800 to be logically coupled toother devices including I/O components 820, some of which may be builtin. Illustrative components include a microphone, joystick, game pad,satellite dish, scanner, printer, wireless device, etc.

Embodiments presented herein have been described in relation toparticular embodiments which are intended in all respects to beillustrative rather than restrictive. Alternative embodiments willbecome apparent to those of ordinary skill in the art to which thepresent invention pertains without departing from its scope.

From the foregoing, it will be seen that this invention in one welladapted to attain all the ends and objects hereinabove set forthtogether with other advantages which are obvious and which are inherentto the structure.

It will be understood that certain features and sub-combinations are ofutility and may be employed without reference to other features orsub-combinations. This is contemplated by and is within the scope of theclaims.

The invention claimed is:
 1. A system for providing enhanced access todata in distributed storage systems, the system comprising: one or morehardware processors; memory configured for providing computer programinstructions to the one or more hardware processors; a collaborationdata proxy component configured to execute on the one or more hardwareprocessors for: determining, using a metadata table stored at arequesting node, whether data corresponding to a data request of therequesting node is downloadable from one or more peer nodes of therequesting node, wherein each node in a peer group comprising therequesting node and the one or more peer nodes includes a metadata tablethat indicates availability of chunks of data in the peer group; anddownloading a chunk of the data from the one or more peer nodes, whenthe chunk is located on the one or more peer nodes, wherein downloadingthe data is based on a download operation; and a long-tail mitigationcomponent configured to execute on the one or more hardware processorsfor: adjusting the download operation based on a long tail mitigationroutine to facilitate downloading the chunk of data by using at leastone of: a contention avoidance workflow or an increased downloadthroughput workflow; wherein the contention avoidance workflow comprisesavoiding contention during the download operation based on a chunkselection scheme, wherein the chunk selection scheme comprisesprioritizing downloading chunks having relatively fewer peer nodesdownloading a selected chunk; and wherein the increased downloadthroughput workflow comprises a plurality of phases, each phasecorresponding to a defined percentage of data chunks of the datadownloaded, and each phase defining a different adjustment to thedownload operation.
 2. The system of claim 1, further comprising thecollaboration data proxy component configured for: determining whetherthe data request from the requesting node is directed to collaborationdata, wherein collaboration data comprises data that is stored on atleast one of: a cloud computing platform storage or one or more peernodes; forwarding the data request to a data source corresponding to thedata request, when the request is not directed to collaboration data;and processing the data request based on availability of the data on thecloud computing platform or the one or more peers nodes, when therequest is directed to collaboration data.
 3. The system of claim 1,further comprising a metadata component configured to execute on the oneor more hardware processors for: initializing a selected node byretrieving metadata tables from the nodes in the peer group when theselected node joins the peer group; detecting pre-cached chunks in theselected node to populate metadata entries in the metadata tables;communicating updates in the metadata tables to the nodes in the peergroup, wherein the updates comprise one of: an insertion change and adeletion change; and maintaining the metadata tables based onmaintenance triggers comprising negative acknowledgement communications.4. The system of claim 1, wherein the plurality of phases include a headphase having a defined head phase percent of data chunks of the datadownloaded, wherein during the head phase the long-tail mitigationcomponent is configured for: upon determining that a node receiving thedata is being throttled at a cloud computing platform storage,throttling a number of network calls for downloading from the cloudcomputing platform storage and triggering network calls for downloadingfrom one or more peer nodes; and upon determining that no throttling isoccurring at the cloud computing platform storage, increasing a numberof network calls to the cloud computing platform storage to download. 5.The system of claim 1, wherein the plurality of phases include a bodyphase having a predefined body phase percent of data chunks of the datadownloaded, wherein during the body phase the long-tail mitigationcomponent is configured for: determining that the predefined body phasepercent of data chunks downloaded has been met; reducing a number ofnetwork calls to the cloud computing platform storage; and increasing anumber of network calls to the one or more peer nodes.
 6. The system ofclaim 1, wherein the plurality of phases include a tail phase having apredefined body phase percent of data chunks of the data downloaded,wherein during the tail phase the long-tail mitigation component isconfigured for: selecting a number of tail phase peer nodes in a tailphase peer node group, wherein the tail phase peer nodes are selectedfor downloading a selected tail phase chunk; communicating a request forthe selected tail phase chunk to each node in the tail phase node group;and processing a chunk received earliest from one of the peer nodes inthe tail phase node group.
 7. The system of claim 1, further comprisinga flexibility configuration component configured to execute on the oneor more hardware processors for: generating a download operationconfiguration instance for the data request of the requesting node,wherein the download operation instance comprises configuration settingsof the download operation for downloading data corresponding to the datarequest from the one or more peer nodes.
 8. The system of claim 7,wherein the download operation configuration instance comprises aplurality of configuration settings programmatically defined usingtemplate elements as constraints for performing the download operation.9. One or more computer storage media having computer-executableinstructions embodied thereon that, when executed, by one or moreprocessors, causes the one or more processors to perform a method forenhanced access to data in distributed storage systems, the methodcomprising: referencing metadata tables corresponding to a peer groupcomprising peer nodes for data corresponding to a data request, whereineach peer node in the peer group includes a metadata table thatindicates availability of chunks of data in the peer group; determiningthat the data corresponding to data request is downloadable from one ormore of the peer nodes, wherein the determination is based on themetadata tables; prioritizing downloading chunks of the data havingrelatively fewer peer nodes downloading a selected chunk; generating adownload operation configuration instance for the data request, whereinthe download operation instance comprises configuration settings fordownloading data corresponding to the data request; and downloading achunk of the data from the one or more peer nodes using theconfiguration settings of the download operation configuration instance.10. The media of claim 9, wherein metadata tables include each of aglobally sorted chunk list, an insertion table, and a deletion table.11. The media of claim 9, wherein the download operation configurationinstance comprises a plurality of configuration settingsprogrammatically defined using template elements as constraints forperforming the download operation.
 12. The media of claim 11, wherein anin-memory-cache-mode template element determines which chunks of dataare cached in memory, and wherein an in-memory-cache-mode is selectedfrom one of the following: a Least Recently Used mode, a LeastFrequently Used mode, a First In First Out scheme, a Random Replacementscheme, a Most Globally Available Scheme.
 13. The media of claim 9,wherein the download operation configuration instance is associated withan account such that the configuration settings are identified based oncloud computing platform data access service level agreements definedfor that account.
 14. A computer-implemented method for enhanced accessto data in distributed storage systems, the method comprising:determining that a data request, associated with a requesting node, isdirected to collaboration data, wherein collaboration data comprisesdata that is stored on at least one of: a cloud computing platformstorage or one or more peer nodes in a collaboration proxy system;referencing metadata tables stored one at each peer node for datacorresponding to the data request, wherein referencing the metadatatables is based on an established communication channel with the peernodes; determining that at least a chunk of the data corresponding tothe data request is downloadable from one or more of the peer nodes,wherein the determination is based on the metadata tables; generating adownload operation configuration instance for the data request of therequesting node, wherein the download operation instance comprisesconfiguration settings for downloading data corresponding to the datarequest from the one or more peer nodes; and downloading a chunk of thedata from the one or more peer nodes using the configuration settings onthe download operation configuration instance and by using a long-tailmitigation routine configured to adjust a download operation of thedownloading using an increased download throughput workflow comprising aplurality of phases, each phase corresponding to a defined percentage ofdata chunks of the data downloaded, and each phase defining a differentadjustment to the download operation.
 15. The method of claim 14,further comprising: initializing the build of metadata tables on the oneor more peer nodes when initializing a selected node, whereininitializing the selected node comprises retrieving metadata tables fromone or more of the peer nodes in a peer group when the selected nodejoins the peer group; detecting pre-cached chunks in the selected nodeto populate metadata entries in the metadata tables; communicatingupdates in the metadata tables to each peer node in the peer group,wherein the updates comprise one of: an insertion change and a deletionchange; and maintaining the metadata tables based on maintenancetriggers to comprising negative acknowledgement communications.
 16. Themethod of claim 14, wherein the download operation configurationinstance comprises a plurality of configuration settingsprogrammatically defined using template elements as constraints forperforming the download operation, wherein an in-memory-cache-modetemplate element determines which chunks of data are cached in memorybased on a Most Globally Available scheme.
 17. The method of claim 14,wherein the long-tail mitigation routine is additionally configured toadjust the download operation using a contention avoidance workflowcomprising prioritizing downloading chunks having relatively fewer peernodes downloading a selected chunk.